Attribute-Value Selection Based on Minimum Description Length
نویسندگان
چکیده
We introduce a new method for attribute value selection, which is driven by the minimum description length principle. We demonstrate the viability of the approach on the Wisconsin breast cancer data set, show a working exa mple and evaluate the approach against earlier systems. Comparisons on different domains are also given. Empirical results show that our approach consistently outperforms competing machine learning algorithms on domains with all numeric, all discrete and mixed attributes types.
منابع مشابه
Attribute Value Selection Considering the Minimum Description Length Approach and Feature Granularity
In this paper we introduce a new approach to automatic attribute and granularity selection for building optimum regression trees. The method is based on the minimum description length principle (MDL) and aspects of granular computing. The approach is verified by giving an example using a data set which is extracted and preprocessed from an operational information system of the Components Toolsh...
متن کاملBayesian Models to Assess Risk of Corruption of Federal Management Units
This paper presents a data mining project that generated Bayesian models to assess risk of corruption of federal management units. With thousands of extracted features related to corruptibility, the data were processed using techniques like correlation analysis and variance per class. We also compared two different discretization methods: Minimum Description Length Principle (MDLP) and Class-At...
متن کاملThe Cruncher: Automatic Concept Formation Using Minimum Description Length
We present The Cruncher, a simple representation framework and algorithm based on minimum description length for automatically forming an ontology of concepts from attribute-value data sets. Although unsupervised, when The Cruncher is applied to an animal data set, it produces a nearly zoologically accurate categorization. We demonstrate The Cruncher’s utility for finding useful macro-actions i...
متن کاملAdjusting the Spanner: Testing an Evidence Accumulation Model of Decision Making
An experiment examined two aspects of performance in a multi-attribute inference task: i) the effect of stimulus presentation format (image or text) on the adoption of decision strategies; and ii) the ability of an evidence accumulation model, which unifies take-the-best (TTB) and rational (RAT) strategies, to explain participants’ judgments. Presentation format had no significant effect on str...
متن کاملMDL-Based Unsupervised Attribute Ranking
In the present paper we propose an unsupervised attribute ranking method based on evaluating the quality of clustering that each attribute produces by partitioning the data into subsets according to its values. We use the Minimum Description Length (MDL) principle to evaluate the quality of clustering and describe an algorithm for attribute ranking and a related clustering algorithm. Both algor...
متن کامل